add message about Linux FLM binary detection#1297
add message about Linux FLM binary detection#1297bluefalcon13 wants to merge 2 commits intolemonade-sdk:mainfrom
Conversation
| run lemonade-server with the environment variable | ||
| <code>LEMONADE_FLM_LINUX_BETA=1</code> set. | ||
| <br><br> | ||
| DOCKER USERS: The beta currently uses `which` to locate your flm binary. |
There was a problem hiding this comment.
@bluefalcon13 can you confirm that FLM works in the docker whatsoever? That would be a pleasant surprise.
There was a problem hiding this comment.
Here is the dockerfile for the lemonade server docker I am using: https://github.com/bluefalcon13/local_ai_stack/blob/main/configs/lemonade/Dockerfile
The docker compose is at the project root.
Can 100% confirm, after a bunch of fighting, I have a functional lemonade docker, with a custom llama-cpp and flm built. I need to bump my max LLMs so I can run them concurrently, then its more fighting to try to get FLM to act as a drafter. :D
There was a problem hiding this comment.
What about the official docker released from this repo?
If you see where I am going with this: if we add a docker note to the website, people will think the built-in docker works with the NPU if they just do the one tip.
Any chance you want to update the mainline docker definition here to work with the NPU?
There was a problem hiding this comment.
I might be able to. I ran into an issue with my Ubuntu docker (I am more familiar with Debian-based distros) cause I moved up to Arch's mainline kernel. Ubuntu did NOT play nice with that, and building XRT (and its plugin) from source requires the kernel headers. Shortly after that, I moved the container to Arch.
There was a problem hiding this comment.
What about the official docker released from this repo?
How about we bundle FLM in that once it releases?
There was a problem hiding this comment.
FWIW - I'm running natively in Arch myself using xrt and xdna-plugin that I uploaded MR's for.
You just need to build FLM, and there is an AUR for that too: https://aur.archlinux.org/packages/fastflowlm-git
Yeah, there is, but in a docker, it's almost the same as pulling source and adding some tweaks :P
I did pull in XRT and the plugin though from extra-testing. Those are super annoying to build.
There was a problem hiding this comment.
Can you help push those out of testing? I'm new to arch packaging. I'm not sure what is needed for that to happen.
There was a problem hiding this comment.
I have no idea how to do that either. Just looking over Arch's docs, and core-testing is pretty clear, but it doesn't seem like the rules are as strict for extra-testing > extra. https://wiki.archlinux.org/title/Official_repositories#extra-testing
There was a problem hiding this comment.
What about the official docker released from this repo?
If you see where I am going with this: if we add a docker note to the website, people will think the built-in docker works with the NPU if they just do the one tip.
Any chance you want to update the mainline docker definition here to work with the NPU?
After much pain, I can confirm, yes it does:
root@30c2954fe628:/opt/lemonade# flm validate
[Linux] Kernel: 7.0.0-rc2-1-mainline
[Linux] NPU: /dev/accel/accel0 with 8 columns
[Linux] NPU FW Version: 1.1.2.65
[Linux] amdxdna version: 0.6
[Linux] Memlock Limit: infinity
root@30c2954fe628:/opt/lemonade#
I inserted the following in at line 67 of the Dockerfile. Never built a .deb before, but in theory, you could do that in a separate stage, and pull the .deb in and install it.
RUN apt update && apt install -y --no-install-recommends \
software-properties-common && add-apt-repository ppa:amd-team/xrt && \
apt update && apt install -y --no-install-recommends \
amdxdna-dkms build-essential cmake git g++ libavcodec-dev libavdevice-dev libavformat-dev \
libavutil-dev libboost-dev libboost-program-options-dev libcurl4-openssl-dev \
libdrm-dev libfftw3-dev libswscale-dev libxrt-dev libxrt-npu2 ninja-build \
uuid-dev && rm -fr /var/lib/apt/lists/*
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
RUN cd /opt && git clone --recursive https://github.com/FastFlowLM/FastFlowLM.git && \
cd /opt/FastFlowLM/src && cmake --preset linux-default -G Ninja \
-DCMAKE_BUILD_TYPE=Release && \
cmake --build build -j$(nproc) && \
cmake --install buildAdditional verification:
root@30c2954fe628:/opt/lemonade# LEMONADE_FLM_LINUX_BETA=1 ./lemonade-server recipes
Recipe Backend Status Message/Version Action
----------------------------------------------------------------------------------------------------------------------------------------------------
flm npu update_required Backend update is required before use. lemonade-server recipes --install flm:npu
kokoro cpu installable Backend is supported but not installed. lemonade-server recipes --install kokoro:cpu
llamacpp system unsupported llama-server not found in PATH -
metal unsupported Requires macOS -
vulkan installable Backend is supported but not installed. lemonade-server recipes --install llamacpp:vulkan
rocm installable Backend is supported but not installed. lemonade-server recipes --install llamacpp:rocm
cpu installable Backend is supported but not installed. lemonade-server recipes --install llamacpp:cpu
ryzenai-llm npu unsupported Requires Windows -
sd-cpp rocm installable Backend is supported but not installed. lemonade-server recipes --install sd-cpp:rocm
cpu installable Backend is supported but not installed. lemonade-server recipes --install sd-cpp:cpu
whispercpp npu unsupported Requires Windows -
vulkan installable Backend is supported but not installed. lemonade-server recipes --install whispercpp:vulkan
cpu installable Backend is supported but not installed. lemonade-server recipes --install whispercpp:cpu
----------------------------------------------------------------------------------------------------------------------------------------------------
root@30c2954fe628:/opt/lemonade#
docker run cmd used:
docker run -it --rm --device /dev/kfd --device /dev/dri --device /dev/accel/accel0 --ulimit memlock=-1:-1 --group-add $(getent group render | cut -d: -f3) --group-add $(getent group video | cut -d: -f3) --security-opt seccomp=unconfined --ipc=host lemonade:test bashI did not run it myself, but that's cause I am currently already running it in my Arch container, and I am not sure I want to find out how graceful that handoff is!

REF: #1296
Add note to docker users regarding binary detection. Chose to refer to docker users as they are most likely to not have
whichinstalled.